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ABSTRACT 



The software according to the invention incorporates a 
glossary management tool that makes it easy for each client 
to customize terminology to the needs of a particular busi- 
ness. With this tool, termed a glossary manager, a company 
can customize a number of feature names in the system to 
provide a more familiar context for their users. A system 
administrator can also customize the manner in which 
"thumbnail" or "preview" images are presented. The system 
performs clustering on search queries, and searches media 
records multi-modally, using two or more approaches such 
as image searching and text searching. An administrator can 
tune search parameters. Two or more streams of metadata 
may be aligned and correlated with a media file, facilitating 
later searching. The system evaluates itself. It folds popu- 
larity information into rankings of search results. 

11 Claims, 4 Drawing Sheets 
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and if two or more parts of speech are possible for a 
particular word, it is tagged with both. After tagging, word 
affixes (i.e. suffixes) are stripped from query words to obtain 
a word root, using conventional inflectional morphology. If 
a word in a query is not known, affixes are stripped from the 5 
word one by one until a known word is found. 

An intermediate query is then formulated to match against 
the file index database. Texts or captions that match queries 
are then returned, ranked, and displayed to the user, with 
those that match best being displayed at the top of the list. 10 
In an exemplary system, the searching is implemented by 
first building a B-tree of ID lists, one for each concept in the 
text database. The ID lists have an entry for each object 
whose text contains a reference to a given concept. An entry 
consists of an object ID and a weight. The object ID provides 15 
a unique identifier and is a positive integer assigned when 
the object is indexed. The weight reflects the relevance of the 
concept to the object's text, and is also a positive integer. 

To add an object to an existing index, the object ID and 
a weight are inserted into the ID list of every concept that is 20 
in any way relevant to the text. For searching, the ID lists of 
every concept in the query are retrieved and combined as 
specified by the query. Since ID lists contain IDs with 
weights in sorted order, determining existence and relevance 
of a match is simultaneous and fast, using only a small 25 
number of processor instructions for each concept-object 

Search Technologies. The system allows users to search 
for media files with many different types of search queries. J0 
For example, users may submit search queries by speaking 
them, typing them, copying them, or drawing them. 

The process of locating a particular file in a large archive 
is a special area for innovation within the inventive software. 
Files are characterized in several ways. First, they have an 35 
identifier, generally similar to a filename, which is unique 
within the system and makes it possible to link up all the 
objects related to a file. These can include the actual high- 
resolution asset, lower-resolution thumbnails or other prox- 
ies for browsing, and information about the file, or metadata . 40 
Searching can be performed on the file identifier, or it can be 
performed on the metadata. In the case of metadata search- 
ing, it is desirable to offer search alternatives that go beyond 
the exact matching process involved in a standard keyword 
search. 45 

Some systems use controlled vocabulary searching as an 
optimization of keyword searching. Keyword searches sim- 
ply match exactly on any word in the user's search query 
that appears in the search target. (In the system according to 
the invention, the search target is the metadata describing a 50 
media file.) The set of potential keywords is quite large (as 
large as the vocabulary of English, or whatever language(s) 
are being used). If there are no limitations on the search 
vocabulary that can be employed, a user can enter a search 
for puma and fail to find any files captioned as mountain lion 55 
or cougar, even though they all refer to the same thing. 
Controlled vocabulary is an attempt to address this problem, 
albeit at considerable cost. In a controlled vocabulary 
retrieval system, cataloguers all agree to use the same terms. 
In practical terms, this implies that, when cataloguing, they 60 
must check their controlled vocabulary lists and be sure not 
to deviate. Sometimes tools can be built to aid in this 
process, depending on the size of the controlled vocabulary. 
Similarly, tools can also be provided to searchers to control 
their search requests. However, controlled vocabulary sys- 65 
terns do not scale beyond a few thousand terms, since it is 
impractical to look up every word in English for every 



search. For broader retrieval systems, for faster cataloguing, 
and for simpler searching, a different approach is superior. 

In addition to standard keyword and Boolean searching, 
the system software incorporates additional advanced tech- 
nology for locating stored files. Rather than limiting search- 
ing to a controlled vocabulary, the system software includes 
natural language search, which allows cataloguers and users 
to employ any words in English (or whatever natural lan- 
guage the retrieval system is using). 

Natural language search incorporates: 

a semantic network of concepts 

additional linguistic techniques, including: 

phrase matching 

derivational morphology, in lieu of stemming 
part of speech tagging 
name recognition 
location recognition 

User-tunable Search Parameters. The system according to 
the invention provides a screen for customers to adjust 
search parameters, to reflect their company use of stored 
media file collections. This is shown in FIG. 3. While the 
parameters may themselves be well-known in a searching 
system, what is emphasized here is that the user (or, more 
likely, an administrator) can be granted access to such 
fundamental decisions about search as: 

(a) how good a match has to be before it is displayed to 
the user, e.g. 50%, and 

(b) how "creative" the search should be, i.e. how much 
should the search terms be expanded to include more distant 
synonyms and related terms. 

It should be borne in mind that the system according to the 
invention can be carried out on an internet, meaning an 
IP-based network, and in particular may be carried out on 
the Internet, meaning the global IP-based network. 

Multimodal Search. Currently, search methods focus on 
textual input. The current invention incorporates new search 
techniques, and combines them in novel ways. 

Image search is becoming useful in commercial applica- 
tions. In the system according to the invention, user search 
input is provided in a new way. Users may wish to select an 
existing image as example input, so that a search consists of 
"Give me more images like this." Perhaps even more useful 
is the ability to select part of an image, analogous with "Give 
me more like this part." In the system according to the 
invention, identifying the part may be done in either of two 
exemplary ways: 

1 . Touch screen: user touches the screen to identify the 
portion of the image that feeds into the search. 

2. Markup, using pen or other screen drawing metaphor, 
including through the system media viewer, which is 
described in more detail below. 

I addition,! search modalities can be combined. This novel 
approach to search is particularly applicable to multimedia. 
Examples of combined, or multimodal, searches, include: 

touch screen and text 
drawing and voice 
drawing and touch screen 

Vocabulary Management. Ideally, the semantic net of 
concepts is quite large and attempts to incorporate every 
word or term in English (or other language being used for 
cataloguing and searching). No matter how large the seman- 
tic net may be, there will be a periodic need to expand or edit 
it. New words appear in English periodically, and, although 
many may be slang and therefore not particularly important 
ext, some will be real new words and will 
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be important enough to include. For example, rollerblading technology. Selecting the correct actor from forty publicity 

and in-line skating are relatively new terms in English, and shots would be much simpler than selecting from among 

depicting those actions is useful in advertising. So the terms thousands of faces. 

need to be added to the semantic net. Semantic net/vocabu- Importantly, the system according to the invention carries 

lary maintenance is generally a manual process, particularly 5 out the automated creation of the face library. Required 

where the user has an existing media library with a thesaurus elements include time-coded metadata (for example, the 

and vocabulary management process. Such maintenance can voice recognition transcript of a video), and the ability to 

also be performed automatically. find the names of people in text. Each time a face and a 

To maintain a vocabulary for an information retrieval person's name appear at the same time code, that occurrence 

application that accepts user queries in natural language, a 10 is a potential new entry for the face library. A user may run 

user maintaining a semantic net would track search queries the facematcher for thousands of hours and sift out the 

in a query log. From the query log, he would determine recurring matches as the most likely. In this way, a reference 

which words are actually novel and are candidates to be library of faces is created, and new material can be cata- 

added to the system vocabulary, by expanding the query log logued automatically. 

using morphology, and possibly a spell checker and name 15 The software according to the invention approaches this 

identifier. The remaining terms that were not matched are the by using alignment techniques to match up two or more 

basis of a list for adding terms to the vocabulary. streams of metadata. For example, a broadcast news pro- 
Tools to manage vocabulary include: may contain closed captioning for the hearing-im- 

A morphological analyzer. This tool strips off any endings P 8 *" 4 11 ma y f 1 , 50 conta ' n a separate description of the news 

and morphological alterations in a query to find the 20 footage, probably created manually by the news department, 

stem, and checks to see if the stem is in the current ^ * accordln S to u th f lnventlon uses alignment to 

vocabulary. If the stem is not, the user doing the ma,ch me descn P tlon . which is not time-coded, with the 

maintenance micht try closed captioning, which is time-coded. This process allows 

a n u i -t-u- . i a i r the system to add time codes to the non-time-coded stream. 

A spell checker. This tool uses the conventional a go- 25 ^ useg ^ sUeam ±g 

nthms to see if the supposedly new word is actua ly a ^ wfth Qew] ^ ^ ^ fof 

misspelling of a known word. If it is not a misspelling, proper ^ ^ ft _ At ^ game xime > ming face recog . 

the user might try. mtion a]goritnms on me video streamj the software finds 

A name identifier. This tool checks to see if the suppos- ftces ^ system ^ to match up me faces wjth ^ proper 

edly new word is in a name configuration, in that it 30 names that describe who they are. This matched set provides 

follows a known first name in the query. If it does^ it is us with a rough cut of a face (or object) reference library . 

added to a candidate name database. If it isjnotjit is ^ is exemp li fied in piQ. 4. 

proposed as a possible new word to be added to the Face rec0 gnition can also be employed to manage the 

system's vocabulary. Hbrary or ^chive of media files. Media libraries are 

Searching Audio/Video by Timecode Correlation with 35 assembled over time, often from disparate sources, and may 

Search Criteria. Video and audio files can be timecoded, or contain multiple copies of a single media file, either with the 

marked such that the software in which they run can locate same metadata or with different metadata. Duplicate detec- 

a specific frame (for videos) or measure (for audio) at any tion is therefore an important element of library and archive 

time. Importantly, the system according to the invention management, and face recognition (and, more generally, 

permits searching timemedia, including video and audio 40 image recognition) can be leveraged to provide that capa- 

files, by combining two search elements. The first is a bility. More broadly, for video, scene detection technology 

standard search, including but not limited to natural lan- can assist in the process of identifying duplicates so that they 

guage search. The second is a time indicator, such as a can be purged from the library. 

SMPTE (Society of Motion Picture and Television Engi- Clustering and Other Ways to Determine Stored File 
neers) standard timecode. Face recognition is an additional 45 Usage. Clustering involves combining user search queries in 
technology that can be used in searching. Face recognition suc h a way that the searches can be analyzed usefully to 
is a subset of the more general technology of object recog- provide answers to business questions. Clustering has 
nition, and indeed techniques described here may extend to received considerable attention in document information 
additional technologies as well. retrieval (IR) and more recently, in video IR as a means of 
The current state of the art in face recognition technology 50 refining retrieval results based on user preferences or pro- 
makes it possible to lake a manually created, labeled library files, and to characterize the marketplace. The prior art 
of faces, and match faces from a video to that library. For contains many examples of clustering applied in information 
example, a user might work with a news video and use a face retrieval systems, but they all apply to search results 
recognition program to label Nelson Mandela in it. The returned to users rather than search queries submitted by 
output of the face recognition program would be a time- 55 users. In the system according to the invention, we cluster 
coded segment, with start and stop times, of when Nelson search queries by topic. We then use that information to 
Mandela was on camera, with the label "Nelson Mandela" adjust the collections of stored files so that the file collec- 
attached to the time codes. While face recognition currently tions will better meet users' needs. 

does not achieve 100% precision or recall, it can still be This system characterizes the information needs of groups 

useful. For example, one known system offers a contract 60 (and subgroups) of users, with respect to a collection of 

rights management capability for films that demands time- media files (e.g. images, videos, sound clips, text, multime- 

coded segments with names attached, and assumes that users dia objects). Some common groupings include: 

will create those manually, so that the correct contract search queries that brought back no files 

restrictions for each film segment can be attached to the right search queries that brought back no files the user was 

time codes. Given a small library of the actors in a film, it 65 interested in 

would be possible to do a fast, automated match-up of time search queries that lead to expressions of user interest or 

codes and actors, even with imperfect face recognition sales 
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This system applies clustering technology to user-submit- 
ted search queries, and to the files retrieved in search results. 
It also includes: 

machine learning as applied to the above. 

characterizing the information needs over time, by user 
type, or by other factors. 

methods for reporting file collection needs to interested 
parties (for example, media suppliers). The system 
informs a supplier that pictures of earthquakes are 
selling briskly, or that users keep looking for videos of 
dance performances but cannot find any. 

methods for adjusting file collections based on the results 
of the clustering analysis, above. 

novel clustering techniques. These include using a seman- 
tic expansion (such as the WordNet hierarchical the- 
saurus) and phrase identification (such as noun phrase, 
and name and location identification) as the basis for 
the clustering. 

Before user queries can be analyzed, they must be 
expanded to a "common denominator". To expand the user 
search queries, we use natural language techniques. Specifi- 
cally, we treat each query as if it were metadata within our 
system, as described in the NLP section, above. Each query 
is expanded through the application of a semantic net, so that 
it contains synonyms and related terms. This expansion 
provides a set of features to which we can apply standard 
clustering technology. 

Obtaining valuable information on user preferences for 
stored files begins with deciding what information a client 
wants to understand. The data set can be selected according 
to various criteria, including: 

queries from a particular subset of users (e.g. registered 
users, users by industry, new users) 

queries that lead to success (sale or other indication) 

queries that lead to failure 

A first step is to select the data set on which clustering is 
to be performed, in an information retrieval (IR) context, 
clustering can be performed on queries or on assets to be 
retrieved (documents, images, video, audio, mixed media). 
A sample query set may include short queries, as is standard 
on Web searches, long queries, as seen in TREC (U.S. 
government-sponsored text retrieval conferences), or as 
produced by QBE (query by example), in which an asset or 
set of assets are themselves inserted into a query. 

A second step is to perform analysis on the queries using, 
for example, linguistic methods, such as: 
Tokenization: determine word/token boundaries. In 
English, tokenization mostly coincides with spaces 
between words, with certain complications (Alzhe- 
imer's^ token vs. she's=2 tokens). 
Morphology or stemming: removed tense and plural 

markers and other affixes to find the word root. 
Identify names, locations, noun phrases: using a pattern 
matcher or other methodology, determine words were 
groupings for special handling. For example, for 
names, match certain kind of variance; for locations, 
match subset; for noun phrases, we to complete and 
headmatches higher than modifier homematches. 
A third step is to expand the queries. Ideally, this step 
includes expansion using a thesaurus or semantic net of 
synonyms, superand other relationships. 

A fourth step is, for each of the terms in each expanded 
query, assign a weight based on how close that term is to the 
original query. The exact weighting will vary by application, 
but the basic understanding is that more closely matching 
terms are weighted close to 100. 



7 7,879 B2 

12 

A fifth step is to create a vector for each expanded query. 
In order to apply a statistical clustering algorithm, we 
arrange the vectors into a matrix. 

A sixth step is to apply a statistical clustering algorithm in 
5 order to group similar queries together. A hierarchical or 
linear clustering strategy may be used, depending on 
whether the desired clusters are hierarchical or not. Clus- 
tering may allow overlap, which means that a query may 
appear in more than one cluster. 
10 A seventh step is to apply the clustering algorithm until 
the stopping condition is met, e.g. the desired number of 
clusters is obtained, or a combination of cluster number and 
desired distinctiveness of clusters is reached. 

An eighth step relates to the clusters. Clusters are most 
15 useful to a human observer if they bear useful names that 
reflect the content. Use a semantic net hierarchy, combined 
with term frequency in a reference corpus, to identify the 
lowestterm in the hierarchy that subsumes all the queries. 

Folding Popularity Into Rankings. Many information 
20 retrieval applications currently incorporate relevance feed- 
back into their judgements of how to rank search results 
returned to a user. In all cases, however, the past systems 
utilize explicit user feedback, not implicit feedback. That is, 
they rank files by requiring a user to indicate what items he 
25 is interested in, once a set of items is returned by a search. 
Importantly, in the system according to the invention, the 
system discerns implicit popularity ranki ngs ba sed on a 
ranked set of user actions. The system then ]thenh ises those 
rankings to resubsequent search results. The user actions 
from which popularity may be determined include, but are 
not limited to: 

whether a file is placed in a projects folder or other work 
space 

35 whether a file is placed in a shopping cart 
whether a file is purchased 

In addition, implicit popularity rankings may be derived 
from one user or set of users and applied to a different user 
or set of users. For example, if User A places a media file in 

40 a projects folder or shopping cart, the information on her 
activity can also be used to research results for User B, who 
is in some way similar to user A. In another example, if users 
with ".edu" e-mail addresses buy certain things, it makes 
sense to research results to favor those things when showing 

45 results to other ".edu" users. In the system according to the 
invention, if registered users who work for advertising 
agencies have placed certain items in their shopping carts, 
other advertising agency employees can have their search 
results re-ranked to favor those items. The opposite can be 

50 true as well : the same system can be used to disfavor certain 
items because they have been sold too many times to 
advertising agencies, for example. 

Retrieval system self-evaluation. The retrieval system 
according to the invention does not answer any of the TREC 

55 tracks — a media file is described by a short paragraph or by 
keywords. Even though it is possible to develop a unique test 
collection for purposes of evaluation, it will not necessarily 
predict the performance of new systems, or even for existing 
ones. 

60 What is now described is a method for ongoing evaluation 
of IR system performance based on search results combined 
with user feedback. This approach enables the system to 
alert a human system manager about observed degradation 
in the system performance. As the performance evaluation 

65 system becomes more knowledgeable through methods of 
machine learning, the system is desirably able to change its 
own parameters in order to improve its performance. 



